In [1]:
# Loading packages
import PIL
import numpy as np
import matplotlib.pyplot as plt
from scipy.ndimage.filters import gaussian_filter1d
import random

import torch
import torchvision
import torch.autograd as autograd
import torchvision.transforms as T

Pretrained Model

For this project, you will use a pre-trained deep neural network, SqueezeNet, which is lightweight and runs fast on CPUs. Run the code below to load a pre-trained SqueezeNet from the PyTorch official model zoo.

In [2]:
# Test and set the device.
if torch.cuda.is_available():
    device = 'cuda:0'
else:
    device = 'cpu'
print('Use', device)

# Download and load the pretrained SqueezeNet model.
model = torchvision.models.squeezenet1_1(pretrained=True).to(device)

# Disable the gradient computation with respect to model parameters.
for param in model.parameters():
    param.requires_grad = False
Use cuda:0

Data

For Task#1 and Task#2, use the images in folder Project1\images where the filenames are the corresponding class labels. For example, 182.png is an image of Border Terrier, which is class 182 in ImageNet dataset. Please refer to this Gist snippet for a complete list. The images are from ImageNet validation set, and so the pre-trained model has never "seen" them.

For Task#3, you may use the images in folder Project1\style or any other images you like.

Helper Functions

Most pre-trained models are trained on images that had been preprocessed by subtracting the per-color mean and dividing by the per-color standard deviation. Here are a few helper functions for performing and undoing this preprocessing.

In [3]:
IMAGENET_MEAN = np.array([0.485, 0.456, 0.406])
IMAGENET_STD = np.array([0.229, 0.224, 0.225])

def preprocess(img, size=(224, 224)):
    transform = T.Compose([
        T.Resize(size),
        T.ToTensor(),
        T.Normalize(mean=IMAGENET_MEAN.tolist(),
                    std=IMAGENET_STD.tolist()),
        T.Lambda(lambda x: x[None]),
    ])
    return transform(img)

def deprocess(img, should_rescale=True):
    transform = T.Compose([
        T.Lambda(lambda x: x[0]),
        T.Normalize(mean=[0, 0, 0], std=(1.0 / IMAGENET_STD).tolist()),
        T.Normalize(mean=(-IMAGENET_MEAN).tolist(), std=[1, 1, 1]),
        T.Lambda(rescale) if should_rescale else T.Lambda(lambda x: x),
        T.ToPILImage(),
    ])
    return transform(img)

def rescale(x):
    low, high = x.min(), x.max()
    x_rescaled = (x - low) / (high - low)
    return x_rescaled

def blur_image(X, sigma=1):
    X_np = X.cpu().clone().numpy()
    X_np = gaussian_filter1d(X_np, sigma, axis=2)
    X_np = gaussian_filter1d(X_np, sigma, axis=3)
    X.copy_(torch.Tensor(X_np).type_as(X))
    return X

Task#1 Adversarial Attack

The concept of "image gradients" can be used to study the stability of a network. Consider a state-of-the-art deep neural network that generalizes well on an object recognition task. We expect such network to be robust to small perturbations to its input, because small perturbations cannot change the object category of an image. However, it was shown in the following paper[1] that by applying an imperceptible non-random perturbation to a test image, it is possible to arbitrarily change the network’s prediction.

[1] Szegedy et al, "Intriguing properties of neural networks", ICLR 2014

Given an image and a target class, we can perform gradient ascent over the image to maximize the target class, stopping when the network classifies the image as the target class. While the perturbations seem negligible to humans, the network would classify the perturbed images wrongly.

Read the paper, and then implement the following function make_adversarial_attack to generate "fooling images". For each image in Project1/images with class label $c$, generate a fooling image that will be classified into class $ c-1-d $ where $d$ is the last digit of your student number. Save each fooling image into the folder Project1/fooling_images with the filename {true_class}_{target_class}.png. You may confirm (optional) that the fooling image 182_9.png in the folder will be wrongly classified as ostrich (class 9 in ImageNet dataset).

For the image 182.png, show the difference map between the original image and the fooling image, and save it as 182_1x_diff.png. Magnify the difference by 10 times and save the resulting map as 182_10x_diff.png.

In [4]:
def make_adversarial_attack(X, target_y, model):
    """
    Generate a fooling image that is close to X, but that the model classifies
    as target_y.

    Inputs:
    - X: Input image; Tensor of shape (1, 3, 224, 224)
    - target_y: An integer in the range [0, 1000)
    - model: A pretrained CNN

    Returns:
    - X_fooling: An image that is close to X, but that is classifed as target_y
    by the model.
    """
    
    model.eval()
    
    # Initialize our fooling image to the input image
    X_fooling = X.clone().detach()
    X_fooling.requires_grad = True

    # you may change the learning rate and max_iter
    learning_rate = 1

    ##############################################################################
    # TODO: Generate a fooling image X_fooling that the model will classify as   #
    # the class target_y. You should perform gradient ascent on the score of the #
    # target class, stopping when the model is fooled.                           #
    # When computing an update step, first normalize the gradient:               #
    #   dX = learning_rate * g / ||g||_2                                         #
    ##############################################################################
    # your code
    criterion = torch.nn.CrossEntropyLoss()
#     print('grad before:',X_fooling.grad)

    for i in range(100):
        output = model(X_fooling)
        (val,ind) = torch.sort(output,descending=True)
        
        ### Check if classified as target label 
        if(ind[0][0]==target_y):
            score = output[0][target_y]
            print('Target label',target_y, 'reached after', i, 'iterations','\n')
            break
        
        ## Compute score
        score = output[0][target_y]
        score.backward()

        ### Perform gradient ascent
        g = X_fooling.grad/torch.norm(X_fooling.grad)
        with torch.no_grad():
            X_fooling = X_fooling + learning_rate*g
        X_fooling.requires_grad = True
       


    ##############################################################################
    #                             END OF YOUR CODE                               #
    ##############################################################################
    X_fooling = X_fooling.detach()
    
    return X_fooling
In [17]:
##############################################################################
# TODO: 1. Compute the fooling images for the images under `Project1/images`.#
#       2. Show the 4 related images of the image '182.png': original image, #
#          fooling image, 182_1x_diff.png and 182_10x_diff.png.              #
##############################################################################
# your code
filenames = ['85','100','182','294','366','662']

for filename in filenames:
    print('Fooling image', filename)
    image_orig = PIL.Image.open('images/'+filename+'.png')
    image_t = preprocess(image_orig).to(device)

    ### Generate fooling image
    target_y = 2;
    image_fool = make_adversarial_attack(image_t, target_y, model)
    
    image_t = image_t.cpu()
    image_fool = image_fool.cpu()
    image_fool = deprocess(image_fool)

    ## Generate diff images
    image_orig_arr = np.array(deprocess(image_t)).astype(float)
    image_fool_arr = np.array(image_fool).astype(float)
        
    image_diff_arr = image_fool_arr-image_orig_arr+127
    image_diff_10_arr = 10*(image_fool_arr-image_orig_arr)+127
    
    image_diff = PIL.Image.fromarray(image_diff_arr.astype(np.uint8))
    image_diff_10 = PIL.Image.fromarray(image_diff_10_arr.astype(np.uint8))

    
    ### Save images
    image_fool.save('fooling_images/'+filename+'_'+str(target_y)+'.png')
    image_diff.save('fooling_images/'+filename+'_1x_diff.png')
    image_diff_10.save('fooling_images/'+filename+'_10x_diff.png')

    
    ### Plot results
    fig, axs = plt.subplots(1, 4,figsize=(15,5))

    axs[0].imshow(image_orig)
    axs[0].set_title('Original Image')
    axs[1].imshow(image_fool)
    axs[1].set_title('Fooled Image')
    axs[2].imshow(image_diff)
    axs[2].set_title('Image Diff x1')
    axs[3].imshow(image_diff_10)
    axs[3].set_title('Image Diff x10')

    
#### Optional Task to geenrate ostrich image
image_orig = PIL.Image.open('images/182.png')
image_t = preprocess(image_orig).to(device)

### Generate fooling image
print('Generating ostrich fooling image for 182')

target_y = 9;
image_fool = make_adversarial_attack(image_t, target_y, model)

### Check classification of image_fool
output = model(image_fool)
(val,ind) = torch.max(output, 1)

print('Fooling image classified as', 9)



# image_rgb = np.array(deprocess(image_t)).astype(float)
# image_fool_rgb = np.array(image_fool).astype(float)
# image_diff_rgb = image_fool_rgb-image_rgb+127
# image_diff_10_rgb = 10*(image_fool_rgb-image_rgb)+127


# fig, axs = plt.subplots(1, 4,figsize=(15,5))

# axs[0].imshow(image_rgb.astype(int))
# axs[0].set_title('Original Image')
# axs[1].imshow(image_fool_rgb.astype(int))
# axs[1].set_title('Fooled Image')
# axs[2].imshow(image_diff_rgb.astype(int))
# axs[2].set_title('Diff Image')
# axs[3].imshow(image_diff_10_rgb.astype(int))
# axs[3].set_title('Diff Image')

# print(image_diff_10.shape)
# print(image_fool.data)
# print(image_t.data)
# print(image_diff.data)
# print(deprocess(image_diff_10.data.clone().cpu()))



##############################################################################
#                             END OF YOUR CODE                               #
##############################################################################
Fooling image 85
Target label 2 reached after 11 iterations 

Fooling image 100
Target label 2 reached after 2 iterations 

Fooling image 182
Target label 2 reached after 7 iterations 

Fooling image 294
Target label 2 reached after 15 iterations 

Fooling image 366
Target label 2 reached after 12 iterations 

Fooling image 662
Target label 2 reached after 7 iterations 

Generating ostrich fooling image for 182
Target label 9 reached after 3 iterations 

Fooling image classified as 9

Task#2 Class Visualization

By starting with a random noise image and performing gradient ascent on a target class, we can generate an image that the network will recognize as the target class. This idea was first presented in [2]; [3] extended this idea by suggesting several regularization techniques that can improve the quality of the generated image.

Concretely, let $I$ be an image and let $y$ be a target class. Let $s_y(I)$ be the score that a convolutional network assigns to the image $I$ for class $y$; note that these are raw unnormalized scores, not class probabilities. We wish to generate an image $I^*$ that achieves a high score for the class $y$ by solving the problem

$$ I^* = \arg\max_I (s_y(I) - R(I)) $$

where $R$ is a (possibly implicit) regularizer (note the sign of $R(I)$ in the argmax: we want to minimize this regularization term). We can solve this optimization problem using gradient ascent, computing gradients with respect to the generated image. We will use (explicit) L2 regularization of the form

$$ R(I) = \lambda \|I\|_2^2 $$

and implicit regularization as suggested by [3] by periodically blurring the generated image. We can solve this problem using gradient ascent on the generated image.

In the cell below, complete the implementation of the create_class_visualization function.

[2] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, "Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps", ICLR Workshop 2014

[3] Yosinski et al, "Understanding Neural Networks Through Deep Visualization", ICML 2015 Deep Learning Workshop

In [6]:
def jitter(X, ox, oy):
    """
    Helper function to randomly jitter an image.
  
    Inputs
    - X: PyTorch Tensor of shape (N, C, H, W)
    - ox, oy: Integers giving number of pixels to jitter along W and H axes
    
    Returns: A new PyTorch Tensor of shape (N, C, H, W)
    """
    if ox != 0:
        left = X[:, :, :, :-ox]
        right = X[:, :, :, -ox:]
        X = torch.cat([right, left], dim=3)
    if oy != 0:
        top = X[:, :, :-oy]
        bottom = X[:, :, -oy:]
        X = torch.cat([bottom, top], dim=2)
    return X
In [18]:
def create_class_visualization(target_y, model, device, **kwargs):
    '''
    Generate an image to maximize the score of target_y under a pretrained model.
    
    Inputs:
    - target_y: A list of two elements, where the first value is an integer in the range [0, 1000) giving the index of the
                class, and the second value is the name of the class.
    - model: A pretrained CNN that will be used to generate the image
    - dtype: Torch datatype to use for computations
    
    Keyword arguments:
    - l2_reg: Strength of L2 regularization on the image
    - learning_rate: How big of a step to take
    - num_iterations: How many iterations to use
    - blur_every: How often to blur the image as an implicit regularizer
    - max_jitter: How much to gjitter the image as an implicit regularizer
    - show_every: How often to show the intermediate result
    '''
    model.to(device)
    l2_reg = kwargs.pop('l2_reg', 1e-3)
    learning_rate = kwargs.pop('learning_rate', 25)
    num_iterations = kwargs.pop('num_iterations', 100)
    blur_every = kwargs.pop('blur_every', 10)
    max_jitter = kwargs.pop('max_jitter', 16)
    show_every = kwargs.pop('show_every', 25)
    
    # Randomly initialize the image as a PyTorch Tensor, and make it requires gradient.
    img = torch.randn(1, 3, 224, 224).mul_(1.0).to(device).requires_grad_()

    for t in range(num_iterations):
        # Randomly jitter the image a bit; this gives slightly nicer results
        ox, oy = random.randint(0, max_jitter), random.randint(0, max_jitter)
        img.data.copy_(jitter(img.data, ox, oy))

        ########################################################################
        # TODO: Use the model to compute the gradient of the score for the     #
        # class target_y with respect to the pixels of the image, and make a   #
        # gradient step on the image using the learning rate. Don't forget the #
        # L2 regularization term!                                              #
        # Be very careful about the signs of elements in your code.            #
        ########################################################################
        # your code
        output_label = model(img)
        
        ## Compute score
        score = output_label[0][target_y[0]]
        score.backward()
        
        g = img.grad/torch.norm(img.grad)
        
        ### Perform gradient ascent
        with torch.no_grad():
            img = img + learning_rate*g - l2_reg*img
            
        img.requires_grad = True

        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################

        # Undo the random jitter
        img.data.copy_(jitter(img.data, -ox, -oy))

        # As regularizer, clamp and periodically blur the image
        for c in range(3):
            lo = float(-IMAGENET_MEAN[c] / IMAGENET_STD[c])
            hi = float((1.0 - IMAGENET_MEAN[c]) / IMAGENET_STD[c])
            img.data[:, c].clamp_(min=lo, max=hi)
        if t % blur_every == 0:
            blur_image(img.data, sigma=0.5)

        # Periodically show the image
        if t == 0 or (t + 1) % show_every == 0 or t == num_iterations - 1:
            plt.imshow(deprocess(img.data.clone().cpu()))
            class_name = target_y[1]
            plt.title('%s\nIteration %d / %d' % (class_name, t + 1, num_iterations))
            plt.gcf().set_size_inches(4, 4)
            plt.axis('off')
            plt.show()

    return deprocess(img.data.cpu())

Once you have completed the implementation in the cell above, run the following cell to generate an image of a Tarantula:

In [19]:
target_y = [76, "Tarantula"]
# target_y = [366, "Gorilla"]
out = create_class_visualization(target_y, model, device)

Task#3 Style Transfer

Another task which is closely related to image gradients is style transfer which has become a "cool" application in deep learning for computer vision applications. You need to study and implement the style transfer technique presented in the following paper [4] where the general idea is to take two images (a content image and a style image), and produce a new image that reflects the content of one but the artistic "style" of the other.

[4] Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "Image style transfer using convolutional neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

Below is an example.

Compute the loss

To perform style transfer, you will need to first formulate a special loss function that matches the content and style of each respective image in the feature space, and then perform gradient descent on the pixels of the image itself.

The loss function contains two parts: content loss and style loss. Read the paper [4] for details about the losses and implement them below.

In [9]:
def content_loss(content_weight, content_current, content_original):
    """
    Compute the content loss for style transfer.
    
    Inputs:
    - content_weight: Scalar giving the weighting for the content loss.
    - content_current: features of the current image; this is a PyTorch Tensor of shape
      (1, C_l, H_l, W_l).
    - content_target: features of the content image, Tensor with shape (1, C_l, H_l, W_l).
    
    Returns:
    - scalar content loss
    """
    
    ##############################################################################
    # TODO: Implement content loss function                                      #
    # Note: It should not be very much code (less than 10 lines)                 #
    ##############################################################################
    
    loss = torch.nn.MSELoss(reduction='sum')
    loss_content= content_weight*loss(content_current,content_original)
    
    return loss_content
    ##############################################################################
    #                             END OF YOUR CODE                               #
    ##############################################################################


def gram_matrix(features):
    """
    Compute the normalized Gram matrix from features.
    The Gram matrix will be used to compute style loss.
    
    Inputs:
    - features: PyTorch Tensor of shape (N, C, H, W) giving features for
      a batch of N images.
    
    Returns:
    - gram: PyTorch Tensor of shape (N, C, C) giving the
      normalized Gram matrices for the N input images.
    """    
    ##############################################################################
    # TODO: Implement the normalized Gram matrix compuation function             #
    # Note: It should not be very much code (less than 10 lines)                 #
    ##############################################################################

    a, b, c, d = features.size()  # a=batch size(=1)

    features = features.view(a * b, c * d).to(device)  # resise 

    G = torch.mm(features, features.t())  # compute the gram product

    return G.div(2*a * b * c * d)
    ##############################################################################
    #                             END OF YOUR CODE                               #
    ##############################################################################


def style_loss(feats, style_layers, style_targets, style_weights):
    """
    Computes the style loss at a set of layers.
    
    Inputs:
    - feats: list of the features at every layer of the current image.
    - style_layers: List of layer indices into feats giving the layers to include in the
      style loss.
    - style_targets: List of the same length as style_layers, where style_targets[i] is
      a PyTorch Variable giving the Gram matrix of the source style image computed at
      layer style_layers[i].
    - style_weights: List of the same length as style_layers, where style_weights[i]
      is a scalar giving the weight for the style loss at layer style_layers[i].
      
    Returns:
    - style_loss: A PyTorch Tensor holding a scalar giving the style loss.
    """
    
    ##############################################################################
    # TODO: Implement style loss function                                        #
    # Note: It should not be very much code (less than 10 lines)                 #
    ##############################################################################
    # your code

    loss = torch.nn.MSELoss(reduction='sum')
  
    losses = []
    temp_loss = []
    for i,layer in enumerate(style_layers):
        image_gram = gram_matrix(feats[layer])
        targets_gram = gram_matrix(style_targets[i])
        losses.append(loss(image_gram, targets_gram))
        
    ## To try to balance out losses in each layer
    mean_loss = sum(losses).data/len(style_layers)

    for i,l in enumerate(losses):
        temp_loss.append((mean_loss/l.data)*l*style_weights[i])

    total_loss = sum(temp_loss)/len(style_layers)
    
    return total_loss
    
    ##############################################################################
    #                             END OF YOUR CODE                               #
    ##############################################################################

Putting them together

With these loss functions, you can now build your style transfer model. Implement the function below to perform style transfer. To test the model, you can use the content and style images that we have provided in Project1/style, or improvise using any image you like. Please save your output images in the Project1/style folder.

Design and carry out some experiments (on your own!) to analyse how the choice of layers and the weights will influence the output image. Write down your observations and analysis in the Markdown cell provided below.

In [33]:
def style_transfer(content_image, style_image, content_layer, content_weight,
                   style_layers, style_weights, max_iter):
    """
    Run style transfer!
    You may first resize the image to a small size for fast computation.
    
    Inputs:
    - content_image: filename of content image
    - style_image: filename of style image
    - content_layer: an index indicating which layer to use for content loss
    - content_weight: weighting on content loss
    - style_layers: list of indices indicating which layers to use for style loss
    - style_weights: list of weights to use for each layer in style_layers
    - max_iter: max iterations of gradient updates
    
    Returns:
    - output_image: an image with content from the content_image and 
    style from the style image
    """
    ##############################################################################
    # TODO: Implement the function for style transfer.                           #
    ##############################################################################
    alpha = 1e-3
    beta = 100
    clamp_range = 2
    
    style_image = preprocess(PIL.Image.open('style/'+ style_image +'.jpg'),size=(448,448)).to(device)
    content_image = preprocess(PIL.Image.open('style/'+content_image + '.jpg'),size=(448,448)).to(device)

    ### Plot content and style images
    plt.imshow(deprocess(content_image.cpu()))
    plt.title('Content Image')
    plt.show()
    
    plt.imshow(deprocess(style_image.cpu()))
    plt.title('Style Image')
    plt.show()

    ### Extract features for content and style images
    featureExtractor = FeatureExtractor(model.features)
    content_features = featureExtractor(content_image)
    style_features = featureExtractor(style_image)
    style_targets =  [style_features[i] for i in style_layers]
        
    ## Initialise random image
    img = torch.randn(1, 3, 448, 448).mul_(1.0).to(device).requires_grad_()
#     img = content_image.clone().to(device)


    ### First training stage
    optimizer = torch.optim.Adam([img.requires_grad_()], lr=0.1)

    for t in range(max_iter):
 
        img.data.clamp_(-clamp_range,clamp_range)
       
        optimizer.zero_grad()
        img_features = featureExtractor(img)

        loss_c = alpha*content_loss(content_weight, img_features[content_layer], content_features[content_layer])
        loss_s = beta*style_loss(img_features, style_layers, style_targets, style_weights)
        loss_total = loss_c+loss_s   
        
        loss_total.backward()
        optimizer.step()

        if t == 0 or (t + 1) % 500 == 0 or t == max_iter - 1:
            print("step: %d, total_loss: %f, content loss: %f, style loss: %f" 
                  %(t+1,loss_total,loss_c,loss_s))
            plt.imshow(deprocess(img.data.clone().clamp_(-clamp_range,clamp_range).cpu(),should_rescale=True))
            plt.gcf().set_size_inches(4, 4)
            plt.axis('off')
            plt.show()

            
    print('Starting second stage')
    optimizer = torch.optim.Adam([img.requires_grad_()], lr=0.005)

    ### Second step
    for t in range(max_iter): 
        img.data.clamp_(-clamp_range,clamp_range)

        optimizer.zero_grad()
        img_features = featureExtractor(img)

        loss_c = alpha*content_loss(content_weight, img_features[content_layer], content_features[content_layer])
        loss_s = beta*style_loss(img_features, style_layers, style_targets, style_weights)

        loss_total = loss_c+loss_s
        loss_total.backward()
        optimizer.step()

        if t == 0 or (t + 1) % 500 == 0 or t == max_iter - 1:
            print("step: %d, total_loss: %f, content loss: %f, style loss: %f" 
                  %(t+1,loss_total,loss_c,loss_s))
            plt.imshow(deprocess(img.data.clone().clamp_(-clamp_range,clamp_range).cpu(),should_rescale=True))
            plt.gcf().set_size_inches(4, 4)
            plt.axis('off')
            plt.show()

    print('Complete')
    final_image = deprocess(img.data.clone().cpu())
    final_image.save('style/Results/'+style_image_file+'+'+content_image_file+'.png')
    return final_image
    ##############################################################################
    #                             END OF YOUR CODE                               #
    ############################################################################## 
In [39]:
# # TODO: 1. Choose one pair of images under 'Project1/style', and finish the  # 
#          neural style transfer task by calling the style_transfer function.#
#       2. Show the 3 related images: content image, style image and the     # 
#          generated style-transferred image.                                #
##############################################################################
### Feature extractor to get feature maps
class FeatureExtractor(torch.nn.Module):
    def __init__(self, submodule):
        super(FeatureExtractor, self).__init__()
        self.submodule = submodule

    def forward(self, x):
        outputs = []
        for name, module in self.submodule._modules.items():
            x = module(x)
            outputs.append(x)

        return outputs
    
    
style_image_file = 'starry_night'
content_image_file = 'tubingen'


content_layer = 3
content_weight = 1
style_layers =[1,2,3,4,6,7,9,10,11,12]
style_weights = np.array([10,10,1,1,1,1,2,2,2,2])
style_weights = style_weights/np.sum(style_weights)
max_iter = 1000

output_img = style_transfer(content_image_file, style_image_file, content_layer, content_weight, style_layers, style_weights, max_iter)

plt.imshow(output_img)
plt.gcf().set_size_inches(6, 6)
plt.axis('off')
plt.title('Final Image')
plt.show()





# ##############################################################################
#                             END OF YOUR CODE                               #
##############################################################################
step: 1, total_loss: 142769.109375, content loss: 8824.881836, style loss: 133944.234375
step: 500, total_loss: 2021.558105, content loss: 1362.535522, style loss: 659.022644
step: 1000, total_loss: 2014.356934, content loss: 1294.741333, style loss: 719.615540
Starting second stage
step: 1, total_loss: 1874.894531, content loss: 1298.750854, style loss: 576.143616
step: 500, total_loss: 1557.921875, content loss: 1250.566040, style loss: 307.355896
step: 1000, total_loss: 1543.551514, content loss: 1239.834595, style loss: 303.716980
Complete

Write your observations and analysis in this Markdown cell:


Style Transfer

The images were optimized in two stages using the Adam optimizer. The first stage has a higher learning rate of 0.1 to speed up the gradient descent and the second second stage has a learning rate of 0.005 to prevent overshooting. The results can be found in the 'style/Results' folder

From experimentation, good results were obtain with the following parameters

  • Content layer: 3
  • Content weight: 1
  • Style layers: 1,2,3,4,6,7,9,10,11,12
  • Style Weights: (10,10,1,1,1,1,2,2,2,2)
  • $\alpha$: 1e-3
  • $\beta$: 100

Some results are shown below with these parameters

Image1.png

Choice of $\alpha$ and $\beta$

$\alpha$ and $\beta$ determine how much emphasis to place on the content and style images respectively. The verify this, $\alpha$ was set to 1e-3 while $\beta$ was set to 20, 100 and 500 respectively. The results shown below demonstrate that as $\beta$ increases, the output iamge become les like the content image as expected

The parameters used were

  • Content layer: 3
  • Content weight: 1
  • Style layers: 1,2,3,4,6,7,9,11,12
  • Style Weights: (5,5,1,1,1,1,2,2,2,2)
  • $\alpha$: 1e-3
  • $\beta$: Varied to measure effect

Image2.png

$\beta$=10 (left), $\beta$=100 (middle), $\beta$=500 (right)

Choice of Style Layer

The squeezenet has 13 feature layers, with conv2d and ReLU for layers 0 and 1, max pool layers at layers 2, 5 and 8 and Fire layers for the rest. Fire layers are made of conv2d and ReLU layers. To study how the choice of style layer affects the output image, the number of style layers was gradually increased with the following parameters:

  • Content image: tubingen
  • Style image: starry_night
  • Content layer: 5
  • Content weight: 1
  • Style layers: Varied to measure effect
  • Style Weights: All 1
  • $\alpha$: 1e-3
  • $\beta$: 100

The results are shown below and it was observed that the lower layers (1-3) were responsible mainly actual pixel values, the middle layers (4-6) were responsible for textures such as brush strokes and the upper layers (7-12) were responsible for more high level features such as the circular swirls in the style image.

Image3.png

Style transfer using layers 1-3 (left), 1-7(middle), 1-12(right)

Choice of style weights

With the idea of what each layer is representing, the weights can then be chosen to emphasize specific features of the style image. This is shown by using the 'engineering' for the content image and 'starry_night' for the style image. By choosing specific weights, it is possible to choose between a 'night' scene or a 'day scene' for the output image. The following are the parameters:

  • Content image: engineering
  • Style image: starry_night
  • Content layer: 3
  • Content weight: 1
  • Style layers: 1 to 12
  • Style Weights: (1,1,1,1,1,1,1,1,5,5,5,5) and (10,10,1,1,1,1,1,1,1,1,1,1)
  • $\alpha$: 1e-3
  • $\beta$: 500

By choosing setting lower weights for the lower style layers, we can preserve the colours of the content image and produce a day scene with the brush strokes of the style image. By choosing higher weights for the lower style layers, the output image will take the colour of the style image and procude a night scene.

Image4.png

Higher style weights for layer 9-12 (left), Higher style weights for layer 1-2 (right)

Choice of content layer

Similarly, effect of which content layer follows this understanding of what each layer represents. When a lower layer is used, the result more closely resembles the actual content image. This is beacuse lower layers more accurately represent actual pixel values while higher layers represent higher level features such as edges, textures and shapes. This is shown below for different content layers

  • Content image: tubingen
  • Style image: starry_night
  • Content layer: Varied to measure effect
  • Content weight: 1
  • Style layer: (1,2,3,4,6,7,9)
  • Style Weights: All 1
  • $\alpha$: 1e-3
  • $\beta$: 100

The result is shown below for content layer = 1,3,5,7,9. As the layer number increases, more abstract features are kept instead of actual pixel values

Image5.png

Output image as the selected content layer increases

Conclusion

The neural style transfer demonstrates how neural networks can be used not just to perform calssification but to understand images as well. A trained neural network stores information about the features in the image, with higher layers containing higher level information.

After studying the various parameters, the following parameters have been found to work well

  • Content layer: 3
  • Content weight: 1
  • Style layers: 1,2,3,4,6,7,9,10,11,12
  • Style Weights: (10,10,1,1,1,1,2,2,2,2)
  • $\alpha$: 1e-3
  • $\beta$: 100
In [ ]: